Overview

Brought to you by YData

Dataset statistics

Number of variables15
Number of observations459
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory328.7 KiB
Average record size in memory733.3 B

Variable types

Text6
Categorical4
Numeric5

Alerts

attributes_similarity is highly overall correlated with similarity_media and 1 other fieldsHigh correlation
expected_intent is highly overall correlated with find_intent and 1 other fieldsHigh correlation
filters_similarity is highly overall correlated with similarity_mediaHigh correlation
find_intent is highly overall correlated with expected_intent and 1 other fieldsHigh correlation
intent_similarity is highly overall correlated with similarity_mediaHigh correlation
similarity_media is highly overall correlated with attributes_similarity and 2 other fieldsHigh correlation
total_attributes is highly overall correlated with attributes_similarity and 2 other fieldsHigh correlation
intent_similarity is highly imbalanced (83.5%) Imbalance
user_msg has unique values Unique
total_attributes has 109 (23.7%) zeros Zeros
entity_similarity has 12 (2.6%) zeros Zeros

Reproduction

Analysis started2025-01-30 19:25:44.799310
Analysis finished2025-01-30 19:25:47.888948
Duration3.09 seconds
Software versionydata-profiling vv4.12.2
Download configurationconfig.json

Variables

user_msg
Text

Unique 

Distinct459
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size50.6 KiB
2025-01-30T16:25:48.000959image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length281
Median length97
Mean length47.592593
Min length8

Characters and Unicode

Total characters21845
Distinct characters71
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique459 ?
Unique (%)100.0%

Sample

1st rowadd student with name=anderson martins gomes, age=20
2nd rowadd subject with name=brazilian history, description=the history of brazil
3rd rowadd teacher with name=paulo henrique, age=65, email=ph@uece.br
4th rowadd subject with name=math, description=the best subject ever
5th rowadd subject with name=math, description='the best subject ever!'
ValueCountFrequency (%)
with 289
 
8.9%
the 153
 
4.7%
add 151
 
4.7%
article 121
 
3.7%
update 96
 
3.0%
setting 75
 
2.3%
show 74
 
2.3%
a 74
 
2.3%
name 74
 
2.3%
and 73
 
2.3%
Other values (612) 2050
63.5%
2025-01-30T16:25:48.310549image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2768
 
12.7%
e 2270
 
10.4%
t 1975
 
9.0%
a 1711
 
7.8%
i 1248
 
5.7%
n 1109
 
5.1%
d 1002
 
4.6%
r 940
 
4.3%
s 913
 
4.2%
o 834
 
3.8%
Other values (61) 7075
32.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 21845
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
2768
 
12.7%
e 2270
 
10.4%
t 1975
 
9.0%
a 1711
 
7.8%
i 1248
 
5.7%
n 1109
 
5.1%
d 1002
 
4.6%
r 940
 
4.3%
s 913
 
4.2%
o 834
 
3.8%
Other values (61) 7075
32.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 21845
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
2768
 
12.7%
e 2270
 
10.4%
t 1975
 
9.0%
a 1711
 
7.8%
i 1248
 
5.7%
n 1109
 
5.1%
d 1002
 
4.6%
r 940
 
4.3%
s 913
 
4.2%
o 834
 
3.8%
Other values (61) 7075
32.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 21845
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
2768
 
12.7%
e 2270
 
10.4%
t 1975
 
9.0%
a 1711
 
7.8%
i 1248
 
5.7%
n 1109
 
5.1%
d 1002
 
4.6%
r 940
 
4.3%
s 913
 
4.2%
o 834
 
3.8%
Other values (61) 7075
32.4%

expected_intent
Categorical

High correlation 

Distinct4
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size28.1 KiB
CREATE
169 
READ
143 
UPDATE
99 
DELETE
48 

Length

Max length6
Median length6
Mean length5.3769063
Min length4

Characters and Unicode

Total characters2468
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCREATE
2nd rowCREATE
3rd rowCREATE
4th rowCREATE
5th rowCREATE

Common Values

ValueCountFrequency (%)
CREATE 169
36.8%
READ 143
31.2%
UPDATE 99
21.6%
DELETE 48
 
10.5%

Length

2025-01-30T16:25:48.465065image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-30T16:25:48.568514image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
create 169
36.8%
read 143
31.2%
update 99
21.6%
delete 48
 
10.5%

Most occurring characters

ValueCountFrequency (%)
E 724
29.3%
A 411
16.7%
T 316
12.8%
R 312
12.6%
D 290
11.8%
C 169
 
6.8%
U 99
 
4.0%
P 99
 
4.0%
L 48
 
1.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2468
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
E 724
29.3%
A 411
16.7%
T 316
12.8%
R 312
12.6%
D 290
11.8%
C 169
 
6.8%
U 99
 
4.0%
P 99
 
4.0%
L 48
 
1.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2468
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
E 724
29.3%
A 411
16.7%
T 316
12.8%
R 312
12.6%
D 290
11.8%
C 169
 
6.8%
U 99
 
4.0%
P 99
 
4.0%
L 48
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2468
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
E 724
29.3%
A 411
16.7%
T 316
12.8%
R 312
12.6%
D 290
11.8%
C 169
 
6.8%
U 99
 
4.0%
P 99
 
4.0%
L 48
 
1.9%

expected_class
Categorical

Distinct38
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size28.6 KiB
article
149 
student
109 
pokemon
30 
city
25 
film
24 
Other values (33)
122 

Length

Max length14
Median length7
Mean length6.5555556
Min length3

Characters and Unicode

Total characters3009
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)2.4%

Sample

1st rowstudent
2nd rowsubject
3rd rowteacher
4th rowsubject
5th rowsubject

Common Values

ValueCountFrequency (%)
article 149
32.5%
student 109
23.7%
pokemon 30
 
6.5%
city 25
 
5.4%
film 24
 
5.2%
company 18
 
3.9%
subject 17
 
3.7%
show 15
 
3.3%
teacher 7
 
1.5%
elf 6
 
1.3%
Other values (28) 59
 
12.9%

Length

2025-01-30T16:25:48.680844image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
article 149
32.5%
student 109
23.7%
pokemon 30
 
6.5%
city 25
 
5.4%
film 24
 
5.2%
company 18
 
3.9%
subject 17
 
3.7%
show 15
 
3.3%
teacher 7
 
1.5%
elf 6
 
1.3%
Other values (28) 59
 
12.9%

Most occurring characters

ValueCountFrequency (%)
t 446
14.8%
e 381
12.7%
c 248
 
8.2%
i 230
 
7.6%
a 200
 
6.6%
l 199
 
6.6%
r 183
 
6.1%
n 181
 
6.0%
s 171
 
5.7%
u 144
 
4.8%
Other values (16) 626
20.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3009
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 446
14.8%
e 381
12.7%
c 248
 
8.2%
i 230
 
7.6%
a 200
 
6.6%
l 199
 
6.6%
r 183
 
6.1%
n 181
 
6.0%
s 171
 
5.7%
u 144
 
4.8%
Other values (16) 626
20.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3009
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 446
14.8%
e 381
12.7%
c 248
 
8.2%
i 230
 
7.6%
a 200
 
6.6%
l 199
 
6.6%
r 183
 
6.1%
n 181
 
6.0%
s 171
 
5.7%
u 144
 
4.8%
Other values (16) 626
20.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3009
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 446
14.8%
e 381
12.7%
c 248
 
8.2%
i 230
 
7.6%
a 200
 
6.6%
l 199
 
6.6%
r 183
 
6.1%
n 181
 
6.0%
s 171
 
5.7%
u 144
 
4.8%
Other values (16) 626
20.8%
Distinct189
Distinct (%)41.2%
Missing0
Missing (%)0.0%
Memory size37.4 KiB
2025-01-30T16:25:48.837156image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length282
Median length150
Mean length24.204793
Min length2

Characters and Unicode

Total characters11110
Distinct characters60
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique152 ?
Unique (%)33.1%

Sample

1st row{'name': 'anderson martins gomes', 'age': '20'}
2nd row{'name': 'brazilian history', 'description': 'the history of brazil'}
3rd row{'name': 'paulo henrique', 'age': '65', 'email': 'ph@uece.br'}
4th row{'name': 'math', 'description': 'the best subject ever'}
5th row{'name': 'math', 'description': 'the best subject ever!'}
ValueCountFrequency (%)
200
 
14.3%
name 116
 
8.3%
title 50
 
3.6%
year 32
 
2.3%
author 30
 
2.1%
age 22
 
1.6%
the 19
 
1.4%
a 16
 
1.1%
raichu 14
 
1.0%
co_author 14
 
1.0%
Other values (395) 885
63.3%
2025-01-30T16:25:49.176027image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
' 1594
14.3%
939
 
8.5%
e 875
 
7.9%
a 764
 
6.9%
t 535
 
4.8%
n 534
 
4.8%
r 532
 
4.8%
o 466
 
4.2%
{ 459
 
4.1%
} 459
 
4.1%
Other values (50) 3953
35.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11110
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
' 1594
14.3%
939
 
8.5%
e 875
 
7.9%
a 764
 
6.9%
t 535
 
4.8%
n 534
 
4.8%
r 532
 
4.8%
o 466
 
4.2%
{ 459
 
4.1%
} 459
 
4.1%
Other values (50) 3953
35.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11110
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
' 1594
14.3%
939
 
8.5%
e 875
 
7.9%
a 764
 
6.9%
t 535
 
4.8%
n 534
 
4.8%
r 532
 
4.8%
o 466
 
4.2%
{ 459
 
4.1%
} 459
 
4.1%
Other values (50) 3953
35.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11110
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
' 1594
14.3%
939
 
8.5%
e 875
 
7.9%
a 764
 
6.9%
t 535
 
4.8%
n 534
 
4.8%
r 532
 
4.8%
o 466
 
4.2%
{ 459
 
4.1%
} 459
 
4.1%
Other values (50) 3953
35.6%
Distinct74
Distinct (%)16.1%
Missing0
Missing (%)0.0%
Memory size29.7 KiB
2025-01-30T16:25:49.352489image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length89
Median length2
Mean length8.6732026
Min length2

Characters and Unicode

Total characters3981
Distinct characters48
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45 ?
Unique (%)9.8%

Sample

1st row{}
2nd row{}
3rd row{}
4th row{}
5th row{}
ValueCountFrequency (%)
275
38.8%
id 76
 
10.7%
name 73
 
10.3%
pikachu 16
 
2.3%
anderson 15
 
2.1%
1 14
 
2.0%
8 11
 
1.6%
year 10
 
1.4%
11 10
 
1.4%
4 10
 
1.4%
Other values (104) 198
28.0%
2025-01-30T16:25:49.654000image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
' 744
18.7%
{ 459
11.5%
} 459
11.5%
249
 
6.3%
e 232
 
5.8%
a 218
 
5.5%
: 187
 
4.7%
i 159
 
4.0%
n 158
 
4.0%
r 118
 
3.0%
Other values (38) 998
25.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3981
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
' 744
18.7%
{ 459
11.5%
} 459
11.5%
249
 
6.3%
e 232
 
5.8%
a 218
 
5.5%
: 187
 
4.7%
i 159
 
4.0%
n 158
 
4.0%
r 118
 
3.0%
Other values (38) 998
25.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3981
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
' 744
18.7%
{ 459
11.5%
} 459
11.5%
249
 
6.3%
e 232
 
5.8%
a 218
 
5.5%
: 187
 
4.7%
i 159
 
4.0%
n 158
 
4.0%
r 118
 
3.0%
Other values (38) 998
25.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3981
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
' 744
18.7%
{ 459
11.5%
} 459
11.5%
249
 
6.3%
e 232
 
5.8%
a 218
 
5.5%
: 187
 
4.7%
i 159
 
4.0%
n 158
 
4.0%
r 118
 
3.0%
Other values (38) 998
25.1%
Distinct191
Distinct (%)41.6%
Missing0
Missing (%)0.0%
Memory size42.4 KiB
2025-01-30T16:25:50.120061image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length677
Median length2
Mean length31.699346
Min length2

Characters and Unicode

Total characters14550
Distinct characters65
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique175 ?
Unique (%)38.1%

Sample

1st row{'name': 'anderson martins gomes', 'age': '20'}
2nd row{'name': 'brazilian history', 'history': 'history of'}
3rd row{'name': 'paulo henrique', 'age': '65', 'email': 'ph@uecebr', '.': 'br'}
4th row{'name': 'math', 'description': 'the best subject ever'}
5th row{'name': 'math', 'description': "'the best subject ever!'"}
ValueCountFrequency (%)
328
 
18.4%
name 94
 
5.3%
title 47
 
2.6%
author 27
 
1.5%
a 22
 
1.2%
the 21
 
1.2%
measure 20
 
1.1%
with 19
 
1.1%
cluster 19
 
1.1%
age 19
 
1.1%
Other values (399) 1164
65.4%
2025-01-30T16:25:50.470707image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
' 1846
 
12.7%
1326
 
9.1%
e 1138
 
7.8%
a 936
 
6.4%
t 755
 
5.2%
n 687
 
4.7%
r 686
 
4.7%
o 614
 
4.2%
i 564
 
3.9%
" 513
 
3.5%
Other values (55) 5485
37.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 14550
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
' 1846
 
12.7%
1326
 
9.1%
e 1138
 
7.8%
a 936
 
6.4%
t 755
 
5.2%
n 687
 
4.7%
r 686
 
4.7%
o 614
 
4.2%
i 564
 
3.9%
" 513
 
3.5%
Other values (55) 5485
37.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 14550
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
' 1846
 
12.7%
1326
 
9.1%
e 1138
 
7.8%
a 936
 
6.4%
t 755
 
5.2%
n 687
 
4.7%
r 686
 
4.7%
o 614
 
4.2%
i 564
 
3.9%
" 513
 
3.5%
Other values (55) 5485
37.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 14550
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
' 1846
 
12.7%
1326
 
9.1%
e 1138
 
7.8%
a 936
 
6.4%
t 755
 
5.2%
n 687
 
4.7%
r 686
 
4.7%
o 614
 
4.2%
i 564
 
3.9%
" 513
 
3.5%
Other values (55) 5485
37.7%
Distinct171
Distinct (%)37.3%
Missing0
Missing (%)0.0%
Memory size34.7 KiB
2025-01-30T16:25:50.626757image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length407
Median length298
Mean length18.960784
Min length2

Characters and Unicode

Total characters8703
Distinct characters58
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique145 ?
Unique (%)31.6%

Sample

1st row{}
2nd row{}
3rd row{}
4th row{}
5th row{}
ValueCountFrequency (%)
299
24.6%
name 97
 
8.0%
id 76
 
6.3%
students 40
 
3.3%
setting 33
 
2.7%
year 28
 
2.3%
rosenberg 18
 
1.5%
with 17
 
1.4%
author 17
 
1.4%
the 16
 
1.3%
Other values (203) 574
47.2%
2025-01-30T16:25:50.947442image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
' 1398
16.1%
757
 
8.7%
e 611
 
7.0%
a 468
 
5.4%
{ 459
 
5.3%
} 459
 
5.3%
t 446
 
5.1%
r 375
 
4.3%
n 367
 
4.2%
: 353
 
4.1%
Other values (48) 3010
34.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 8703
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
' 1398
16.1%
757
 
8.7%
e 611
 
7.0%
a 468
 
5.4%
{ 459
 
5.3%
} 459
 
5.3%
t 446
 
5.1%
r 375
 
4.3%
n 367
 
4.2%
: 353
 
4.1%
Other values (48) 3010
34.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 8703
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
' 1398
16.1%
757
 
8.7%
e 611
 
7.0%
a 468
 
5.4%
{ 459
 
5.3%
} 459
 
5.3%
t 446
 
5.1%
r 375
 
4.3%
n 367
 
4.2%
: 353
 
4.1%
Other values (48) 3010
34.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 8703
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
' 1398
16.1%
757
 
8.7%
e 611
 
7.0%
a 468
 
5.4%
{ 459
 
5.3%
} 459
 
5.3%
t 446
 
5.1%
r 375
 
4.3%
n 367
 
4.2%
: 353
 
4.1%
Other values (48) 3010
34.6%
Distinct66
Distinct (%)14.4%
Missing0
Missing (%)0.0%
Memory size28.6 KiB
2025-01-30T16:25:51.106829image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Length

Max length45
Median length7
Mean length6.4749455
Min length1

Characters and Unicode

Total characters2972
Distinct characters34
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)6.5%

Sample

1st rowstudent
2nd rowsubject
3rd rowteacher
4th rowsubject
5th rowsubject
ValueCountFrequency (%)
article 113
24.2%
student 79
17.0%
pokemon 24
 
5.2%
city 22
 
4.7%
articles 20
 
4.3%
students 20
 
4.3%
film 20
 
4.3%
subject 17
 
3.6%
show 15
 
3.2%
a 12
 
2.6%
Other values (58) 124
26.6%
2025-01-30T16:25:51.385244image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 429
14.4%
e 375
12.6%
s 237
 
8.0%
c 225
 
7.6%
i 223
 
7.5%
a 208
 
7.0%
l 183
 
6.2%
n 172
 
5.8%
r 164
 
5.5%
u 136
 
4.6%
Other values (24) 620
20.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2972
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 429
14.4%
e 375
12.6%
s 237
 
8.0%
c 225
 
7.6%
i 223
 
7.5%
a 208
 
7.0%
l 183
 
6.2%
n 172
 
5.8%
r 164
 
5.5%
u 136
 
4.6%
Other values (24) 620
20.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2972
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 429
14.4%
e 375
12.6%
s 237
 
8.0%
c 225
 
7.6%
i 223
 
7.5%
a 208
 
7.0%
l 183
 
6.2%
n 172
 
5.8%
r 164
 
5.5%
u 136
 
4.6%
Other values (24) 620
20.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2972
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 429
14.4%
e 375
12.6%
s 237
 
8.0%
c 225
 
7.6%
i 223
 
7.5%
a 208
 
7.0%
l 183
 
6.2%
n 172
 
5.8%
r 164
 
5.5%
u 136
 
4.6%
Other values (24) 620
20.9%

find_intent
Categorical

High correlation 

Distinct4
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size28.1 KiB
CREATE
158 
READ
136 
UPDATE
111 
DELETE
54 

Length

Max length6
Median length6
Mean length5.4074074
Min length4

Characters and Unicode

Total characters2482
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCREATE
2nd rowCREATE
3rd rowCREATE
4th rowCREATE
5th rowCREATE

Common Values

ValueCountFrequency (%)
CREATE 158
34.4%
READ 136
29.6%
UPDATE 111
24.2%
DELETE 54
 
11.8%

Length

2025-01-30T16:25:51.539554image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-30T16:25:51.637851image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
create 158
34.4%
read 136
29.6%
update 111
24.2%
delete 54
 
11.8%

Most occurring characters

ValueCountFrequency (%)
E 725
29.2%
A 405
16.3%
T 323
13.0%
D 301
12.1%
R 294
11.8%
C 158
 
6.4%
U 111
 
4.5%
P 111
 
4.5%
L 54
 
2.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2482
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
E 725
29.2%
A 405
16.3%
T 323
13.0%
D 301
12.1%
R 294
11.8%
C 158
 
6.4%
U 111
 
4.5%
P 111
 
4.5%
L 54
 
2.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2482
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
E 725
29.2%
A 405
16.3%
T 323
13.0%
D 301
12.1%
R 294
11.8%
C 158
 
6.4%
U 111
 
4.5%
P 111
 
4.5%
L 54
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2482
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
E 725
29.2%
A 405
16.3%
T 323
13.0%
D 301
12.1%
R 294
11.8%
C 158
 
6.4%
U 111
 
4.5%
P 111
 
4.5%
L 54
 
2.2%

total_attributes
Real number (ℝ)

High correlation  Zeros 

Distinct7
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.2766885
Minimum0
Maximum7
Zeros109
Zeros (%)23.7%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2025-01-30T16:25:51.721516image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q32
95-th percentile3
Maximum7
Range7
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.99984777
Coefficient of variation (CV)0.78315721
Kurtosis2.6774864
Mean1.2766885
Median Absolute Deviation (MAD)1
Skewness0.92414809
Sum586
Variance0.99969556
MonotonicityNot monotonic
2025-01-30T16:25:51.826898image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1 161
35.1%
2 160
34.9%
0 109
23.7%
3 17
 
3.7%
4 8
 
1.7%
5 3
 
0.7%
7 1
 
0.2%
ValueCountFrequency (%)
0 109
23.7%
1 161
35.1%
2 160
34.9%
3 17
 
3.7%
4 8
 
1.7%
5 3
 
0.7%
7 1
 
0.2%
ValueCountFrequency (%)
7 1
 
0.2%
5 3
 
0.7%
4 8
 
1.7%
3 17
 
3.7%
2 160
34.9%
1 161
35.1%
0 109
23.7%

intent_similarity
Categorical

High correlation  Imbalance 

Distinct4
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size28.0 KiB
100.0
438 
50.0
 
10
19.999999999999996
 
9
60.0
 
2

Length

Max length18
Median length5
Mean length5.2287582
Min length4

Characters and Unicode

Total characters2400
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row100.0
2nd row100.0
3rd row100.0
4th row100.0
5th row100.0

Common Values

ValueCountFrequency (%)
100.0 438
95.4%
50.0 10
 
2.2%
19.999999999999996 9
 
2.0%
60.0 2
 
0.4%

Length

2025-01-30T16:25:51.949116image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-01-30T16:25:52.036578image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
100.0 438
95.4%
50.0 10
 
2.2%
19.999999999999996 9
 
2.0%
60.0 2
 
0.4%

Most occurring characters

ValueCountFrequency (%)
0 1338
55.8%
. 459
 
19.1%
1 447
 
18.6%
9 135
 
5.6%
6 11
 
0.5%
5 10
 
0.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2400
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 1338
55.8%
. 459
 
19.1%
1 447
 
18.6%
9 135
 
5.6%
6 11
 
0.5%
5 10
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2400
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 1338
55.8%
. 459
 
19.1%
1 447
 
18.6%
9 135
 
5.6%
6 11
 
0.5%
5 10
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2400
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 1338
55.8%
. 459
 
19.1%
1 447
 
18.6%
9 135
 
5.6%
6 11
 
0.5%
5 10
 
0.4%

entity_similarity
Real number (ℝ)

Zeros 

Distinct27
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean90.231687
Minimum0
Maximum100
Zeros12
Zeros (%)2.6%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2025-01-30T16:25:52.128281image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile22.222222
Q193.333333
median100
Q3100
95-th percentile100
Maximum100
Range100
Interquartile range (IQR)6.6666667

Descriptive statistics

Standard deviation24.365125
Coefficient of variation (CV)0.27002848
Kurtosis5.6882098
Mean90.231687
Median Absolute Deviation (MAD)0
Skewness-2.6469721
Sum41416.344
Variance593.65931
MonotonicityNot monotonic
2025-01-30T16:25:52.249037image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
100 340
74.1%
93.33333333 50
 
10.9%
0 12
 
2.6%
22.22222222 9
 
2.0%
75 6
 
1.3%
66.66666667 6
 
1.3%
18.18181818 4
 
0.9%
36.36363636 3
 
0.7%
25 3
 
0.7%
46.15384615 3
 
0.7%
Other values (17) 23
 
5.0%
ValueCountFrequency (%)
0 12
2.6%
14.28571429 1
 
0.2%
16.66666667 1
 
0.2%
18.18181818 4
 
0.9%
20 2
 
0.4%
22.22222222 9
2.0%
23.07692308 1
 
0.2%
25 3
 
0.7%
28.57142857 1
 
0.2%
29.26829268 1
 
0.2%
ValueCountFrequency (%)
100 340
74.1%
94.11764706 2
 
0.4%
93.33333333 50
 
10.9%
92.30769231 1
 
0.2%
90.90909091 2
 
0.4%
88.88888889 3
 
0.7%
85.71428571 1
 
0.2%
75 6
 
1.3%
73.68421053 1
 
0.2%
66.66666667 6
 
1.3%

attributes_similarity
Real number (ℝ)

High correlation 

Distinct153
Distinct (%)33.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean80.80638
Minimum7.8431373
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2025-01-30T16:25:52.372774image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum7.8431373
5-th percentile16.6
Q169.742063
median100
Q3100
95-th percentile100
Maximum100
Range92.156863
Interquartile range (IQR)30.257937

Descriptive statistics

Standard deviation28.379442
Coefficient of variation (CV)0.35120299
Kurtosis0.48403822
Mean80.80638
Median Absolute Deviation (MAD)0
Skewness-1.3557104
Sum37090.128
Variance805.39274
MonotonicityNot monotonic
2025-01-30T16:25:52.505430image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 244
53.2%
22.22222222 18
 
3.9%
12.5 4
 
0.9%
66.66666667 4
 
0.9%
16 4
 
0.9%
95.23809524 4
 
0.9%
26.66666667 4
 
0.9%
16.66666667 3
 
0.7%
67.74193548 3
 
0.7%
95.83333333 3
 
0.7%
Other values (143) 168
36.6%
ValueCountFrequency (%)
7.843137255 2
0.4%
8.333333333 1
 
0.2%
9.756097561 1
 
0.2%
10.52631579 1
 
0.2%
10.81081081 2
0.4%
11.42857143 2
0.4%
12.12121212 1
 
0.2%
12.5 4
0.9%
13.33333333 1
 
0.2%
13.79310345 2
0.4%
ValueCountFrequency (%)
100 244
53.2%
99.29078014 1
 
0.2%
98.27586207 1
 
0.2%
97.77777778 2
 
0.4%
97.56097561 1
 
0.2%
97.5 1
 
0.2%
97.48743719 1
 
0.2%
97.2972973 1
 
0.2%
96.96969697 1
 
0.2%
96.77419355 1
 
0.2%

filters_similarity
Real number (ℝ)

High correlation 

Distinct104
Distinct (%)22.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean80.034335
Minimum3.7735849
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2025-01-30T16:25:52.634356image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum3.7735849
5-th percentile22.105263
Q157.894737
median100
Q3100
95-th percentile100
Maximum100
Range96.226415
Interquartile range (IQR)42.105263

Descriptive statistics

Standard deviation30.070405
Coefficient of variation (CV)0.37571881
Kurtosis-0.2980088
Mean80.034335
Median Absolute Deviation (MAD)0
Skewness-1.1360756
Sum36735.76
Variance904.22927
MonotonicityNot monotonic
2025-01-30T16:25:52.771239image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 284
61.9%
22.22222222 22
 
4.8%
57.89473684 10
 
2.2%
95 9
 
2.0%
25 6
 
1.3%
28.57142857 4
 
0.9%
26.66666667 4
 
0.9%
20 3
 
0.7%
19.04761905 3
 
0.7%
23.52941176 3
 
0.7%
Other values (94) 111
 
24.2%
ValueCountFrequency (%)
3.773584906 1
0.2%
5.263157895 1
0.2%
5.333333333 1
0.2%
6.666666667 1
0.2%
7.272727273 1
0.2%
8.888888889 1
0.2%
10.81081081 1
0.2%
11.05990783 1
0.2%
11.42857143 1
0.2%
11.88118812 1
0.2%
ValueCountFrequency (%)
100 284
61.9%
96.77419355 1
 
0.2%
95.65217391 1
 
0.2%
95.45454545 1
 
0.2%
95.23809524 1
 
0.2%
95 9
 
2.0%
94.44444444 1
 
0.2%
94.11764706 1
 
0.2%
92.68292683 1
 
0.2%
92.30769231 1
 
0.2%

similarity_media
Real number (ℝ)

High correlation 

Distinct255
Distinct (%)55.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean87.060039
Minimum24.479432
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.7 KiB
2025-01-30T16:25:52.898721image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum24.479432
5-th percentile57.688021
Q178.347393
median93.75
Q398.913043
95-th percentile100
Maximum100
Range75.520568
Interquartile range (IQR)20.56565

Descriptive statistics

Standard deviation14.838294
Coefficient of variation (CV)0.17043748
Kurtosis0.69573559
Mean87.060039
Median Absolute Deviation (MAD)6.25
Skewness-1.1767138
Sum39960.558
Variance220.17497
MonotonicityNot monotonic
2025-01-30T16:25:53.041010image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 98
 
21.4%
98.33333333 36
 
7.8%
80.55555556 21
 
4.6%
70.02923977 10
 
2.2%
93.75 7
 
1.5%
98.80952381 4
 
0.9%
91.66666667 3
 
0.7%
92 3
 
0.7%
98.95833333 3
 
0.7%
81.25 3
 
0.7%
Other values (245) 271
59.0%
ValueCountFrequency (%)
24.47943226 1
0.2%
40.35421785 1
0.2%
41.25481386 1
0.2%
41.46491228 1
0.2%
42.88386212 1
0.2%
43.17144863 1
0.2%
43.95833333 1
0.2%
46.75830705 1
0.2%
49.52380952 2
0.4%
49.571949 1
0.2%
ValueCountFrequency (%)
100 98
21.4%
99.82269504 1
 
0.2%
99.56896552 1
 
0.2%
99.44444444 2
 
0.4%
99.3902439 1
 
0.2%
99.375 1
 
0.2%
99.3718593 1
 
0.2%
99.32432432 1
 
0.2%
99.24242424 1
 
0.2%
99.19354839 1
 
0.2%

Interactions

2025-01-30T16:25:47.071588image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:45.257633image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:45.725626image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:46.177485image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:46.619593image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:47.168369image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:45.357843image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:45.825111image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:46.270624image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:46.712913image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:47.259849image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:45.448016image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:45.912984image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:46.356868image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:46.802266image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:47.346604image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:45.535972image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:45.999768image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:46.440149image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:46.890531image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:47.438413image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:45.626801image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:46.084573image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:46.526015image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2025-01-30T16:25:46.975245image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Correlations

2025-01-30T16:25:53.141177image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
attributes_similarityentity_similarityexpected_classexpected_intentfilters_similarityfind_intentintent_similaritysimilarity_mediatotal_attributes
attributes_similarity1.000-0.1180.0710.4610.1480.4490.1970.576-0.627
entity_similarity-0.1181.0000.3620.1320.0110.1310.0000.3100.270
expected_class0.0710.3621.0000.3320.0000.3080.2940.0990.319
expected_intent0.4610.1320.3321.0000.3960.9320.1420.2730.559
filters_similarity0.1480.0110.0000.3961.0000.4190.4800.7180.039
find_intent0.4490.1310.3080.9320.4191.0000.1620.3140.531
intent_similarity0.1970.0000.2940.1420.4800.1621.0000.5360.038
similarity_media0.5760.3100.0990.2730.7180.3140.5361.000-0.148
total_attributes-0.6270.2700.3190.5590.0390.5310.038-0.1481.000

Missing values

2025-01-30T16:25:47.585335image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-01-30T16:25:47.795952image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

user_msgexpected_intentexpected_classexpected_attributesexpected_filter_attributesprocessed_attributesprocessed_filtersfind_entityfind_intenttotal_attributesintent_similarityentity_similarityattributes_similarityfilters_similaritysimilarity_media
0add student with name=anderson martins gomes, age=20CREATEstudent{'name': 'anderson martins gomes', 'age': '20'}{}{'name': 'anderson martins gomes', 'age': '20'}{}studentCREATE2100.0100.000000100.000000100.000000100.000000
1add subject with name=brazilian history, description=the history of brazilCREATEsubject{'name': 'brazilian history', 'description': 'the history of brazil'}{}{'name': 'brazilian history', 'history': 'history of'}{}subjectCREATE2100.0100.00000081.300813100.00000095.325203
2add teacher with name=paulo henrique, age=65, email=ph@uece.brCREATEteacher{'name': 'paulo henrique', 'age': '65', 'email': 'ph@uece.br'}{}{'name': 'paulo henrique', 'age': '65', 'email': 'ph@uecebr', '.': 'br'}{}teacherCREATE3100.0100.00000092.537313100.00000098.134328
3add subject with name=math, description=the best subject everCREATEsubject{'name': 'math', 'description': 'the best subject ever'}{}{'name': 'math', 'description': 'the best subject ever'}{}subjectCREATE2100.0100.000000100.000000100.000000100.000000
4add subject with name=math, description='the best subject ever!'CREATEsubject{'name': 'math', 'description': 'the best subject ever!'}{}{'name': 'math', 'description': "'the best subject ever!'"}{}subjectCREATE2100.0100.00000098.275862100.00000099.568966
5add student name=andersonCREATEstudent{'name': 'anderson'}{}{'name': 'anderson'}{}studentCREATE1100.0100.000000100.000000100.000000100.000000
6get students with name andersonREADstudent{}{'name': 'anderson'}{}{'name': 'anderson'}studentsREAD1100.093.333333100.000000100.00000098.333333
7show me the teachersREADteacher{}{}{}{}teachersREAD0100.093.333333100.000000100.00000098.333333
8show teachersREADteacher{}{}{}{}teachersREAD0100.093.333333100.000000100.00000098.333333
9show teacherREADteacher{}{}{}{'teacher': ''}showREAD0100.018.181818100.00000023.52941260.427807
user_msgexpected_intentexpected_classexpected_attributesexpected_filter_attributesprocessed_attributesprocessed_filtersfind_entityfind_intenttotal_attributesintent_similarityentity_similarityattributes_similarityfilters_similaritysimilarity_media
449update article with id=12, setting revision=1UPDATEarticle{'revision': '1'}{'id': '12'}{'revision': '1'}{'id': '12'}articleUPDATE2100.0100.000000100.000000100.000000100.000000
450add a movie with title='batman begins'CREATEfilm{'title': 'batman begins'}{}{'title': "'batman begins'"}{}movieCREATE1100.022.22222296.296296100.00000079.629630
451add a movie with title='the dark knight', "running time"=152CREATEfilm{'title': 'the dark knight', 'running_time': '152'}{}{'title': "'the dark knight'", 'running': 'time"152'}{}movieCREATE2100.022.22222288.461538100.00000077.670940
452get all students with name='anderson'READstudent{}{'name': 'anderson'}{}{'students': "with name'anderson'", 'name': "'anderson'"}aREAD1100.00.000000100.00000051.94805262.987013
453add a former teacher with name='onelia'CREATEformer_teacher{'name': 'onelia'}{}{'name': "'onelia'", 'onelia': "'"}{}teacherCREATE1100.066.66666767.924528100.00000083.647799
454add an invoice with value=100CREATEinvoice{'value': '100'}{}{'value': '100'}{}invoiceCREATE1100.0100.000000100.000000100.000000100.000000
455get all invoiceREADinvoice{}{}{}{}invoiceREAD0100.0100.000000100.000000100.000000100.000000
456add car with license=2020CREATEcar{'license': '2020'}{}{'license': '2020'}{}carCREATE1100.0100.000000100.000000100.000000100.000000
457add car with color=blackCREATEcar{'color': 'black'}{}{'color': 'black'}{}carCREATE1100.0100.000000100.000000100.000000100.000000
458get all carsREADcar{}{}{}{}carsREAD0100.085.714286100.000000100.00000096.428571